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Abstract- Image In painting is the process of reconstructing lost or deteriorated parts of images and videos. It is an important 
problem in computer vision and holds several importance in many imaging and graphics applications, e.g. restoring old photos 
and videos, automatic scene editing, denoising, compression and image based rendering. The traditional method of Image In 
painting which are mostly based on machine learning models work well for background in painting, they cannot hallucinate 
novel image contents for challenging tasks such as in painting of faces and complex objects as well as failing to capture high 
level objects semantics. It has been discovered that by simply introducing a small bit of noise to the original data, most 
mainstream neural nets may be readily misled into misclassifying items. This is because most machine learning models only 
learn from a little quantity of data and the input-to-output mapping is nearly linear, which is a major disadvantage and leads 
to overfitting. The present method where we use GANs, or Generative Adversarial Networks, are a type of generative modelling 
that employs deep learning techniques such as convolutional neural networks. GANs has a capability of learning from data that 
is unstructured or unlabeled, the algorithms try to learn using method of feature extraction which is very different, more reliable 
and fully automatic. Celeb Faces Attributes Dataset (Celeb A) is large scale face attributes dataset with more than 200K celebrity 
images, each with 40 attributes annotations. 
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1. INTRODUCTION 

The task of filling empty pixels in a picture, also known as image in painting or completion, is crucial in computer 
vision. It can be used in photo editing, image-based rendering, and computational photography, among other things 
[1]. The most difficult aspect of image in painting is creating visually realistic and semantically believable pixels for 
missing regions that are consistent with existing pixels. Many non-DL approaches exist, such as Diffusion-based and 
patch-based methods, Exempler-based picture in painting, Patch offset statistics for image completion, Content Aware 
fill in Adobe Photoshop, and so on, that function well in background in painting jobs and are frequently used in 
practical applications. The primary idea of approaches based on the patch is to fill in the empty area of the map, draw 
the boundary of the missing region. Barnes proposed a method that looks for a match, fill in the missing parts of the 
image with a patch made from the rest of the image. As a result, the texture information is more appropriate. However, 
it fails miserably when confronted with complex situations, in painting of images (faces, natural images) and the result 
of in painting will be hazy. Approaches based on exemplar-based methods are poor at filling in the gaps with intricate 
in painting structures [2]. The reason behind this is that the texture synthesis method is inefficient. They cannot 
hallucinate unique image contents for tough scenarios when in painting regions involve complicated, nonrepetitive 
structure since they presume missing patches can be discovered someplace in background regions (e.g. faces, objects) 


[3]. 
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TABLE I. Comparison of diffusion-, texture-, and hybrid-, learning-based image inpainting models 
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S.No Beatie Diffusion-based Texture-based Hybrid-based Learning-based 
. Models Models Models Models 
1 Size of the hole Small Large Large Large 
2 Data to be filled Geometrical Texture : scares a ; a a“ 
structure and texture | structure and texture 
3 Images required One One/Two One Image database 
4 —— Mostly high Mostly medium Mostly low Very high 
Keanna a HERS Large texture Removal of objects, Removal of object, 
scratches, noise, c f : aise 
5 Applications ilar. canacd generation, removal recovery of large restoration of large 
: Bee , of objects, and degraded region, and | damaged area, and 
reconstruction of tg eT ae eS pti ee 
last blocks covering of surface | Editing of an image image editing 


The proposed method uses GANs which are clever way of training a generative model by framing the problem as a 
supervised learning problem with two sub-models: the generator model, which we train to generate new examples, 
and the discriminator model, which tries to classify examples as real (from the domain) or fake (not from the domain) 
(generated) [4]. Both models are trained in an adversarial zero-sum game until the discriminator model is tricked 
around half of the time, indicating that the generator model is providing credible examples. Celeb Faces Attributes 
Dataset (CelebA), the images in this dataset cover large pose variations and background clutter. Celeb A has large 
diversities, large quantities, and rich annotations, including 10,177 number of identities, 202,599 number of face 


images and 5 landmark locations, 40 binary attributes annotations per image. 


2. METHODOLOGY 


There are three stages involved in our proposed method: 


¢ Image Data Preprocessing 


* Deep Convolutional GAN Architecture used for implementation 


¢ Performance Measure 


2.1. Image Data Preprocessing 


* This is done to make data compatible to our application. 
* Semantic alignment of training images was ensured and cropped for further processing. 
* Resizing each image to dimensions of 64*64. 


¢ Normalising the data to be distribution b/w [-1,1] in order to make it a zero centric distribution. That way its also 


easier to calculate norms-L1 and L2 for further processing and analysis [5]. 


2.2. Deep Convolutional GAN Architecture used for implementation 


2.2.1 Loss Functional Model-The loss function used for our model is given by, 


z = arg min( L-(z|y,M) + Lew ), 


(1) 


where y is corrupted image,M is binary mask with size equal to the image 


We use a combination of two loss models to build our cumulative loss functions- weighted context loss L, and prior 


loss Lp 
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Prior Loss- focus on more high level image feautre representations instead of pixel wise differences. The prior loss 
encourages the recovered image to be similar to samples drawn from training set. 


L,() = Mog(1-D(G(z)) (2) 


Where A is a parameter to balance b/w two losses, z is updated to fool D and make corresponding generated image 
more realistic. 


e The Generative model,G,takes 100 dimensional vector drawn from a uniform distribution between [-1,1] and 
generate 64*64*3 image. 


e For Discriminator model D, the input layer is a 64*64*3 picture, followed by a sequence of convolution layers with 
half the image dimension and twice the number of channels as the preceding layer, and the output layer is a two- 
class softmax [6]. 
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FIGURE 1. Network architecture 
e During the training step, we utilise Adam for optimization with a value of 0.003. 


e zis discovered in the inpainting step by using Adam and constraining z to [-1,1] in each iteration. 


DC GAN Architecture- 
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FIGURE 2. DC GAN Network architecture 
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Generator Discriminator 
Convolutional- Up convolution Standard CNN Architecture 
Pooling Layers-Fractional Strided Pooling Layers- Strided Convolutions 
Convolutions 
Batch Normalisation Activation-Relu and Batch Normalisation Activation-Leaky 
Tanh(only for final layer) Relu 


2.3. Performance Measure- 
The loss of the generator and discriminator networks, as well as the number of iterations conducted during training, 
are depicted in the graph below. 
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FIGURE 3. Loss in Generative Adversarial Network Architecture 


3. RESULTS AND DISCUSSIONS 


3.1. Visual Comparison 

Our approach is compared to FMM with random regular and irregular masks. The results demonstrate that approaches 
based on the FMM (Fast Marching Method) are unable to recover adequate image features and provide fuzzy and 
noisy outcomes. Then we compare our outcomes to those from the GLCIC (Globally and Locally Consistent Image 
Completion) and DIP (Distributed Image Processing) (Digital Image Processing) [7,8]. 
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(a) (b) (c) (d) (e) (f) 
FIGURE 4. In painting results on the Celeb A test dataset compared to a regular missing region. 


Comparisons of results obtained utilising our suggested technique, FMM-based method, GLCIC model, and DIP 
model are shown in FIGURES.4 and 5. The ground truth images are in column (a), while the masked images are in 
column (b). Columns (c), (d), and (e) show the FMM, GLCIC, and DIP results, respectively. In the figure, the last 
column reveals our results. In comparison to previous methods, the images generated by our model are more similar 
to the ground truth photos [9]. 


(a) (b) (c) (d) (e) (f) 


FIGURE 5. In painting performances on the Celeb A test dataset with irregular missing patches are compared. 


3.2 Quantitative Comparison 

PSNR (Peak Signal to Noise Ratio) is the most generally used image objective evaluation index, which is based on 
the error between corresponding pixels, or error-sensitive image quality evaluation. The higher the PSNR, the more 
similar the image is to the original. 

The SSIM (Structural Similarity Index) assesses the overall similarity of the two images based on three factors: 
brightness, contrast, and structure. [0, 1] is the SSIM value range. The greater the SSIM number, the more similar are 
the images [10]. 
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FIGURE 6. Quantitative evaluations in terms of PSNR and SSIM at different mask size. 


TABLE 2. Quantitative results of different methods on Celeb A test dataset. 


PSNR SSIM 

FMM 23.52 0.79 
GLCIC 27.81 0.82 
DIP 25.46 0.80 
GAN 28.90 0.84 


4. CONCLUSIONS AND FUTURE WORKS 

This paper discusses a revolutionary approach of picture inpainting. In comparison to existing methods that use local 
image priors or patches, the proposed adversarial network method learns the representations of training data and can 
thus predict meaningful content for corrupted images. Unlike previous inpainting procedures, this method produces 
images with crisper edges that appear more realistic.We intend to improve our model in the future to deal with the 
challenge of image inpainting with much more sophisticated information missing, as well as comparing our results to 
those of a variety of other methods.Batch normalisation, which can cause problems when combined with different 
hole sizes, impacting activation distribution and picture boundaries (particularly when applying trained networks to 
higher resolutions), can be avoided by adopting two phase training, then finetune (encoder) and freeze (encoder). Due 
to CPU and memory constraints, high resolution inpainting may be limited, thus we can either optimise the model 
using quantization or optimise the inference code. Expand the receptive field, increase the convolutional filter, or 
utilise a multi scale technique to overcome the huge mask issues. It is possible to perform post-processing such as 
mixing with similar high-resolution patches from the surrounding area. Mask generation based on semantic 
segmentation, salient object detection, and facial landmark detection can be used to automate scene modification and 
build customised training sets based on facial landmarks. 
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